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Abstract 

The multiprocessor effect refers to the loss of computing cycles due 
to processing overhead. Amdahl's law and the Multiprocessing Factor 
(MPF) are two scaling models used in industry and academia for estimat- 
ing multiprocessor capacity in the presence of this multiprocessor effect. 
Both models express different laws of diminishing returns. Amdahl's law 
identifies diminishing processor capacity with a fixed degree of serializa- 
tion in the workload, while the MPF model treats it as a constant geo- 
metric ratio. The utility of both models for performance evaluation stems 
from the presence of a single parameter that can be determined easily 
from a small set of benchmark measurements. This utility, however, is 
marred by a dilemma. The two models produce different results, espe- 
cially for large processor configurations that are so important for today's 
applications. The question naturally arises: Which of these two models 
is the correct one to use? Ignoring this question merely reduces capacity 
prediction to arbitrary curve-fitting. Removing the dilemma requires a 
dynamical interpretation of these scaling models. We present a physical 
interpretation based on queueing theory and show that Amdahl's law cor- 
responds to synchronous queueing in a bus model while the MPF model 
belongs to a Coxian server model. The latter exhibits unphysical effects 
such as sublinear response times hence, we caution against its use for large 
multiprocessor configurations. 

Keywords: Amdahl's law; benchmarking; multiprocessor effect; perfor- 
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1 Introduction 



The multiprocessor effect is a generic term for the fraction of processing cycles 
usurped by the system (both software and hardware) in order to execute a given 
workload. Typical sources of multiprocessor overhead include: 

1. Operating system code paths (system calls in Unix; supervisor calls in 
MVS) 

2. Exchange of shared writable data between processor caches across the 
system bus 

3. Data exchange between processors across the system bus to main memory 

4. Lock synchronization of accesses to shared writable data 

5. Waiting for a I/O to complete 

In the absence of such overhead, the aggregate processor capacity would scale 
linearly. This could occur if there were single-threaded applications running 
on each processor. More commonly, however, diminishing processing capacity 
reduces the potential economics of scale offered by symmetric multiprocessors; 
a point first observed by Gene Amdahl |Q . 

The multiprocessor effect can be viewed as a type of interaction between pro- 
cessors as they contend for shared subsystem resources. As more processors are 
added to the backplane (to process more work presumably) system overhead 
increases due to the increasing degree of processor interaction. This interaction 
exhibits itself as incremental capacity falling short of the linear ideal. Therefore, 
any attempt to predict the multiprocessor effect requires a nonlinear function. 

We examine a class of single parameter functions used for estimating multi- 
processor capacity in the presence of the multiprocessor effect. For sizing p 
processors, the capacity functions C(p) must satisfy the following general crite- 
ria: 

1. Concave function of p. 

2. Monotonically increasing. 

3. Vanishes at zero capacity: C(0) = 0. 

4. Bounded above: G(p) — > const, as p — > oo. 
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Two members of this class of capacity functions that have widespread appli- 
cation in industry and perennial discussion in the literature are: (i) Amdahl's 
law § 

commonly associated with parallel processors ((|], (3), @)j an d (h) the Multi- 
processing Factor (MPF) 

1 - d> p 

C(0,p) = T -^- (2) 

used for sizing multiprocessor platforms (@,§|, @, §); particularly mainframe 
vendors (||, flC|| , Jll[). In both ([!]) and (||)7the respective parameters a and 
4> are real-valued on the open interval (0, 1). 

Both equations express laws of diminishing returns (Q) but they should not be 
regarded as laws in the sense of Little's law, however, because they are not 
universal. Rather, they reflect a particular set of ad hoc assumptions which we 
shall examine more closely in section |[ 

To set the perspective for what follows, we contrast the ad hoc application of ([!]) 
and (Q) to multiprocessor sizing with a more principled methodology for sizing 
a memory or network buffer. Like C(p), the buffer size, Q(p), belongs to a class 
of functions that must satisfy similar general criteria: 



1. Be a convex function. 

2. Be monotonically increasing on the interval [0, 1]. 

3. Vanishes at zero load: Q(0) = 0. 

4. Unbounded above: Q(p) — > oo as p — > 1. 



The queueing characteristics of different buffer models will have similar but not 
identical curves (Fig. XXXXX) . In the process of characterizing the buffer size, 
one first selects a queueing model (e.g., M/M/l or M/G/l) based on an un- 
derstanding of the buffer dynamics and then validates the corresponding queue 
length formula Q(p) against measurements. Even if this methodology is not 
strictly adhered to on every occasion, one has the option of doing it this way. 

Picking either of the processor sizing equations (Q) and (^), on the other hand, 
is analogous to blindly choosing an ad hoc queue length formula without any 
regard for the underlying queueing dynamics. In this sense, it might be more 
accurate to refer to ([!]) and (|^) as lores for diminishing returns. 

On the other hand, the usefulness of sizing equations like (Q) and (|^) lies in the 
fact that there is only one parameter and it can be determined easily by linear 
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regression on just a few benchmark measurements (Q, (jT3), Q, The 
question arises: Which is the best choice of parametric model against which to 
fit the data? 

If we assume the leading order characteristics are the same for small multipro- 
cessor configurations, Figure [I] shows that their respective asymptotes are very 
different and therefore the parametric models predict very different large-scale 
configuration capacities. Note that the MPF model saturates before the Am- 



(P) E qu i vil ft nt Leading Order 




Figure 1: Common leading behaviour. 



dahl model under these conditions and it therefore predicts a smaller overall 
capacity than the Amdahl model. 

Alternatively, we could consider the other extreme shown in Figure || where both 
models approach the same asymptote at for large multiprocessor configurations. 
Now, the faster saturation of the MPF model means that maximal capacity is 
reached at smaller configurations than predicted by the Amdahl model. 

We take the position that such questions should be addressed on purely phys- 
ical grounds, otherwise, multiprocessor capacity predictions are reduced to an 
exercise in mere curve fitting. The problem is that no consistent physical inter- 
pretation of these parametric models exists. 

Elsewhere jL(|, this author has shown how these parametric scaling models 
could be expanded as a finite series in which each term has a distinct pictorial 
representation. This led to the conclusion that ([!]) can be regarded as rep- 
resenting a "broadcast" protocol while (||) can be regarded as representing a 
"bucket brigade" protocol |Ti| . The latter is a less than satisfactory because 
it appears quite unphysical when compared to the way actual multiprocessor 
systems operate. 
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Figure 2: Common asymptote. 



In this paper, we present a more consistent interpretation based on queue- 
ing models. The usual difficulty with modeling multiprocessors as elementary 
queues (e.g., M/M/m) is that they do not account for "interference" effects be- 
tween the processors (the so-called multiprocessor effect). We shall overcome 
this limitation in two distinct ways: 

1. Multiprocessor Speedup will be identified with a bus-oriented M/M/l/ /p 
queueing model where processor interference is represented as communi- 
cation delays across the bus. 

2. The Multiprocessing Factor will be identified with a processor-oriented 
M/G/I model representing the run-queue where multiprocessor interfer- 
ence is associated with a staged service distribution. 

Single class workloads are assumed throughout since that will prove sufficient 



Just as queueing delays for elementary queues can have vastly different analytic 
forms (assuming a closed analytic form exists), it would be useful to select the 
parametric sizing model on the basis of the underlying queueing dynamics along 
the lines indicted earlier for the sizing of buffers. 



2 Multiprocessor Scalability 

We begin by briefly reviewing the conventional intuition behind the single pa- 
rameter sizing models in (jlj) and (|2|). 



for the analysis of (|l|) and (||). 
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2.1 Multiprocessor Speedup 



Amdahl's law |Q is well-known and frequently cited in the context parallel pro- 
cessing performance ([[|, [Q, |l9|) where is it also known as the speedup (||, 
[|| ) . The underlying notion is that for a fixed workload size [] there is a fraction 
a £ (0, 1) of the workload for which the execution time remains constant as p 
increases. Ultimately, this fraction dominates the speedup function causing it 
to become sublinear. 



In the subsequent queueing analysis, it will be more useful to use the dual rep- 
resentation of processing capacity based on relative throughput or scaleup ]l7t : 

<W-§| (3) 

(||) is reflective of the motivation for selling multiprocessors that support com- 
mercial applications. There, the goal is to accommodate incremental user 
growth through the purchase of increased processor capacity while minimizing 
the degradation to single user responsiveness. 

Assume the number of users (N) per processor is fixed (i.e., N/p = const.). Let 
Ri be the mean response time experienced by N users on a single processor. We 
would like to maintain the response times at Ri but adding another processor 
with N users (now 2N total users across 2 processors), we find R 2 > R\ due to 
the multiprocessor effect. 



Defining the number of completed transactions per processor as c, the unipro- 
cessor throughput is X(l) = cjR\. For 2 processors, the throughput becomes 
X(2) = 2c/i?2 where the response time i?2 with 2 processors is sightly longer 
than Ri by a fractional amount a R\. In other words, X(2) = 2c/(l + a)R\. 
For 3 processors we have X(3) = 3c/ (1 + 2a)R\. 

Generalizing to p processors, the throughput is X(p) = p c/R p where R p = 
Ri + (p — 1) oR\ accounts for the fractional increase in response time due to 
the activity of users on other (p - 1) processors. Substituting X(l) and X(p) 
into (||) produces: 

C{<T,P) = — = ? TT — 4) 

T p c Ri + (p - 1) a Ri 

which, after the elimination of is identical to ([!]). The asymptotic capacity 
is: 

lim C(a,p) = - (5) 
p— too a 

1 |2p| noted that a workload scaled to the number of processors could recover linear be- 
haviour under certain ideal circumstances. We shall not consider such exceptional cases here. 
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The reason that the expressions for the speedup in ([!]) and the scaleup in 
are identical (i.e., duals of each other) follows from: 



(l-a)/p _ (1-tr) 
a pa 

The key quantity that determines the sublinear capacity is the ratio of the 
"parallel" portion (1 — a) to "serial" portion a of the workload. With respect 
to that ratio, it is inconsequential whether the parallel portion is scaled down 
by p or the serial portion is scaled up by p. The effect on the ratio is the same. 



2.2 Multiprocessing Factor 

The multiprocessing factor (MPF) is intended as a measure of how much ef- 
fective processor capacity is available (or lost) as more processors are added 
to the backplane. Consider a workload running on a uniprocessor that has 
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Figure 3: Different MPF factors. 



a measured throughput of X(l) = 100 transactions per second (TPS). When 
run on a dual processor the aggregate throughput is measured as X(2) = 
180 TPS. Since X(2) is less than double X(l), this loss can be expressed as: 
180 = (1 + 0)100 TPS, where the quantity 4> = 0.8 is the MPF. The second 
processor only contributes 80 percent of the capacity of the first processor. 
Continuing along these lines, a third processor would only be expected to con- 
tribute 80% of the second processor i.e., 64 TPS. The aggregate throughput 
being: X(3) = X(l) + <j)X{l) + <j>(4>X(l)) = 244 TPS. 

2 Notice that this value differs from that which would be obtained by taking the simple 
arithmetic average 2 , 1 1 8 ° , = 0.90. 
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Generalizing this cumulative procedure and applying the definition in (Q) pro- 
duces: 

C(4>, P ) = i + 4> + <j) 2 + ... + ftp- 1 (6) 

which is equivalent to (^) for </> < 1 since it is a finite geometric sum. The 
asymptotic capacity is: 

lim C(cj>,p) = (7) 

p^oo X — (p 

If 0= 1 (no MPF), then 

C(l,p)=J2<t> k =P (8) 

fc=0 

which is a linear rising function representing ideal multiprocessor scalability. 
For the purposes of comparison, ([l]) can also be written as a finite series 

C(<7,p) = l + A x + A 2 + ... + (9) 

where 

A *= - l , 1 ~, <T iv i=l,2,...,(p-l)- 

1 + (7 (p — 1) 

Unfortunately (||) is not a power series ^ like (|^) , so the choice of scaling equation 
is not obvious even in a series representation. Other ambiguities persist. (Q) 
and (J2J) could be matched either at leading order by setting C(er, 1) = C(<p, 1) as 
shown Figure [l] or they could be matched asymptotically by setting a = (1 — 0) 
as shown Figure 0. 



3 Queueing Dynamics 

In this section, we develop queueing models to resolve the ambiguities described 
above. 



3.1 Bus-oriented Model 



The bus-oriented model comprises a closed queueing network, or Repairman 
model [ pl| , containing a finite number (p) of requests and K queueing centers 
(K = 1 repair station and mean service demand D will be sufficient for our 
discussion). 

3 The denominator in (hi) can be expanded as a power series but it is an infinite series. 




X(P) 



Figure 4: Repairman model. 



The requests can be thought of as memory references 22 issued by p processors 
each of which executes in "parallel" for a mean time (Z). The queueing center 
represents a "serial" bus or other interconnect network j23j by which the pro- 
cessors can communicate or transfer data. Typically, we expect D <C Z to 
hold because the mean execution periods should exceed the mean transit times 
across the bus. 



System throughput (X) and communication latency (R) are related by: 

P 



X{p) 



(10) 



The saturation bound X max 
multiprocessor can achieve. 



R{p) + Z 

1/D represents the maximum throughput the 



3.1.1 Synchronous Requests 

The worst case bound [ p4[ on multiprocessor throughput (X m i n ) occurs when 
all p processors issue synchronous communication requests. Then R(p) = p D 
(maximal queueing) and ( |Io| ) becomes: 

X S ync{p) = pD P +Z ( n ) 

Using the definition in (^|), we can use (|llj) to write: 

r (v)- p(D + Z) - P (12) 



D+Z I 1 \D+Z 



Rearranging terms and simplifying produces: 

1 

(P 1) (A) + 



Csyncip) = T r, \ ( 13 ) 
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Figure 5: Capacity of the Repairman and Amdahl models. 



We immediately recognize ( |l3| ) as a version of (Q) where the parameter a is now 
identified with the queueing parameters D and Z via the ratio: 



D 



D + Z 



(14) 



The range of values for a in (14) evidently corresponds to 



1. (7 — > as D — ► (zero latency) 

2. cr -» 1 as Z (zero execution) 



( |l4"l) establishes that C sync (p) in (|l^ ) is identical to C(cr,p) in (Q). Although 
the queue-theoretic bound (|ll| ) on throughput is known |]24jj , its relationship to 
the Amdahl scaleup ([|) seems not to have been discussed in the literature. 

The bus-oriented queueing model also supports an earlier conclusion [ fl6| , [fl7f 
that Amdahl's law can be interpreted as representing a kind of "broadcast pro- 
tocol" where all execution simultaneously halts while processors exchange mes- 
sages across the communication fabric. 



3.1.2 Response Times 

The general response time for the Repairman model in Fig. (|]) is given by: 
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The response characteristics for the bus-oriented model can be determined by 
substituting ( p"l| ) into (|lj) and simplifying: 

R sy „ c (p)=pD (16) 

We see that the relative response time 
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Figure 6: Response characterises of the Repairman and Amdahl models. 



Rsync (1) 



P (17) 



corresponding to the Amdahl bound is a linear function of p (Fig. ^) and 
independent of a because the system is already in severe saturation due to 
synchronized queueing. 



3.2 Processor-oriented Model 



The Coxian distribution |2lJ] represents a type of composite server (see Fig. 
[ p2| , with staged exponentially distributed service rates Hi for i = 1,2, . . . , 
p stages, and probability 

p-i 
A; = a,, 

of advancing to the i th server and branching probability hi of exiting after the 
i th server. The next request cannot enter the service facility until the current re- 
quest has either completed all stages or exited after the i th stage. Consequently, 
there is no queueing at any of the Coxian stages. 
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Figure 7: Coxian server. 



The expected service time (first moment) is: 



with variance: 



Var{S} = E{S 2 } - E 2 {S} 
where the second moment is given by: 



E{S 2 } = J2 A * b * 



(18) 
(19) 

(20) 



A well-known |2l[] special case is the Erlang-k distribution where all fa = /i and 
all bi = and squared coefficient of variation C 2 {S} = 1/p p5[. 



3.2.1 Uniform Coxian 



The special case of interest to us is, /ij = n, = (f> with a p = and b- t = 1 
with bo = 0. We shall refer to this as a uniform Coxian distribution. Then (|18| 
reduces to: 

1 d> 6 2 fi6 p_1 

E{S} = - + ^ + ^ + ...+ V (21) 

H H H fi 

Using Little's law U = XE{S}, ( |2l] ) can be rewritten as: 

A / 1 - df s 



U(cP,p) 



(22) 



M V 1 

Since U{4>,p) > 1, ( |22| ) represents the total utilization of the uniform Coxian 
service facility. It is bounded above by 

P 



1 
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where p = X/fJ,. Moreover, (g) can now be expressed in terms of (E2) as: 

pC{<j>,p)=U{^p) (23) 

Hence, the MPF capacity model presented in section || can also be interpreted 
as the total utilization of a p-stage uniform Coxian server. For a single-stage 
server p = 1 and ( |23| ) reduces to U(cf), 1) = p, as expected. 

In this queueing model, the finite geometric series in (jfy arises from the branch- 
ing process within the service center (not the arrivals process). This branching 
represents the loss of service after some number of processor cycles due to sys- 
tem overhead. The total utilization U(<fi, p) corresponds to the average impact 
of that loss. 



3.2.2 Response Times 

The corresponding response times (Fig. |§|) for the uniform Coxian model can be 
calculated as an M/G/l queue using the Pollacck-Khintchine formula [EHj: 



R(cp,p)=E{S} 



P(1 + C|{S}) 
2(l-p) 



(24) 



where the squared coefficient of variation 

r,2 Sqx _ Var{S} 
MM- #T£gj- 

is defined in terms of ( |l8| ) and ( |20| ) and lies in the range 1/p < C^{S} < 1. As 
expected, the uniform Coxian model represents a hypoexponential server. The 
variance in the service time is smaller than it would be for an M/M/l queue. 

The response time (|24|) is plotted as a function of p in figure (||) for a fixed value 
of <f>. It has the typical characteristic expected of an open class queue. With 
only a single stage and probability 1 of advancement ((f) = 0.98) R(p) is close 
to an M/M/l queue since C^{S} ~ 1. As more stages are added, the response 
time at any load p increases as shown by the curves for p = 10 and p = 50. 
Note, however, that the progressive increase at that load becomes smaller as the 
number of stages increases. This effect can be seen more clearly in figure (||) 
which shows response times plotted as a function of p for a fixed load p = 0.75 up 
to 100 stages representing a large-scale multiprocessor. A surprising feature, for 
modeling multiprocessors, is that the response time characterisitcs are sublinear 
for all 4> < 1- Contrast this with the response time characterisitcs in figure (^). 

4 Thc geometric scries in (|2l[) should not be confused with the geometrically distributed 
probability = p k (l — p) of finding k customers in an M/M/l queue. 
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Figure 8: Response time R{p) as a function of p. 
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Only for the special case <f> = 1 (Erlang-p), does the response time increase 
linearly because there, all the processor work is accounted for i.e., U (<j>,p) = pp. 
That case, however, is tantamount to linear scalability in (pi) which ignores the 
MP effect and is therefore of little value for multiprocessor sizing. 

The queue-theoretic attributes of the MPF model can be summarized as follows: 

1. Only one request at a time can enter the Coxian server. 

2. Multiprocessor overhead is treated as a probabilistic loss of work. 

3. Processor utilization due to the MP effect is unaccounted for. 

4. Service periods are hypo-exponential. 

5. Response times become sublincar with an increasing number of processors. 

These characteristics appear counter-intuitive as a model of multiprocessor scal- 
ability. 



4 Conclusions 

Based on our queueing analysis of these multiprocessor models we are now in a 
position to say something about the applicability of the bus-oriented (Amdahl) 
model defined by (Q) and the server-oriented model (MPF) model defined by 

(§• 

If matched at small processor configurations, both capacity models are essen- 
tially indistinguishable when fitted to benchmark data. As configurations be- 
come larger, however, the MPF model becomes pessimistic relative to the Am- 
dahl model. This appears contradictory when we recall that Amdahl scaling 
corresponds to the worst-case bound of the more constrained closed queueing 
model. 

Capacity scaling for the bus-oriented (Amdahl) model in section |3.l| is an ex- 
plicit function of the system throughput X(p). Response times for bus-oriented 
(Amdahl) model will have the classic "hockey-stick" shape due to the negative 
feedback effects of a finite number of requests in the closed queueing network. 
Such response time curves are associated with the constraint that no more than 
one bus request per processor can be outstanding. Utilizations of both the bus, 
U(p), and the processors, Z.X(p), are accounted for explicitly. 

Based on the discussion in section [T^, the relative capacity for server-oriented 
model (MPF) model is equivalent to the total utilization U(p) of a p-stage Cox- 
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ian server. For a given value of <j>, the total utilization becomes sublinear with 
increasing stages because the likelihood diminishes that a request will visit all 
stages. In this model, multiprocessor overhead is treated as a loss of serviceable 
work. 



Considered as an M/G/l queue, the multiprocessor is represented as a single 
Coxian server with processor interference accounted for by the variance in the 
service period. That only one request can enter the Coxian server at a time is 
already unrealistic for a model of a multiprocessor but a variance in the service 
periods that is less than an exponential server (i.e., /17/poexponential) , seems 
contradictory to expectations for a model of the multiprocessor effect. M/G/l 
queues with C 2 {S} ^> 1 (i.e., non-Coxian) have been used to model disk storage 
and token ring networks |25|| , however, we need the Coxian stages to account 
for the geometric series in (|2|). 

A /lypercxponcntial Coxian would also produce higher variance C 2 {S} > 1 in 
the service periods but it is well known ([pi), p5|) that just a few parallel stages 
are sufficient for that and thus p could no longer be associated explicitly with 
the number of processors. Moreover, and hyper-exponential Coxian does not 
produce a mean service time that has the geometric series required to account 
for (|). 

The response time for the Coxian server model becomes sublinear as the pro- 
cessor configuration is expanded. This is unlikely to be seen in benchmark 
measurements of real multiprocessors. Such an unphysical effect follows from 
the fact that the utilization due to lost processor work is unaccounted for in the 
Coxian model. In reality, one expects multiprocessor overhead to be accrued 
as processor kernel time rather than processor user time. The total processor 
utilization is the sum of both contributions but the uniform Coxian server does 
not account for kernel time in the workload. 



Finally, we suggest that neither of the models considered here is truly sufficient 
as a general model of multiprocessor scalability. Elsewhere, we have already 
proposed a two-parameter model pfj : 

C(a,(3,p) = -— — P (25) 

l + a [{p- 1) +/3p (p-1)) 

in which the a parameter is identified with queueing delays and the (3 parameter 
with additional delays due to pairwise coherency [^7| mismatches jlj]] . The lat- 
ter induces retrograde throughputs C(a, (3, p) — > 1/p as p — ► oo that are indeed 
seen in multiprocessor capacity measurements [^8) . Retrograde throughput can- 
not be modeled parametrically using either (Q) or (||) nor can it be represented 
using conventional queueing theory without the introduction of load-dependent 
servers such that E{S} ~ 1/p. In the limit where coherency penalties van- 
ish (/3 = 0), (|2^) reduces to the Amdahl model (with a = a) in ([!]). As we 
have demonstrated here, Amdahl's law has a natural physical interpretation as 
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synchronous queueing within a Repairman model. The two-parameter function 
(Eq) can be viewed as a load-dependent extension of that queueing dynamics. 



Although we have been able to show that the MPF scaling equation (|^) belongs 
to an M/G/l queueing model with a load-dependent Coxian server, that load 
dependence is not of the correct type for modeling multiprocessor overhead 
because it gives rise to unphysical effects. We therefore caution against its use 
for large-scale multiprocessor servers. 
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